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1.  INTRODUCTION 


Sever*]  different  kinds  of  stationary  and  nonstationary  time  series  modeling  problems  are 
considered  here  from  a  Bayesian- smoothness  priors  approach.  The  smoothness  priors  specify  the 
prior  distribution  of  the  time  series  model  parameters. 

The  term  "smoothness  priors"  is  very  likely  due  to  Shiller  (1973).  Shiller  modeled  the  distri¬ 
buted  lag  (impulse  response)  relationship  between  the  input  and  output  of  economic  time  series 
under  difference  equation  "smoothness"  constraints  on  the  distributed  lags.  A  tradeoff  of  the 
goodness-of-fit  of  the  solution  to  the  data  and  the  goodness-of-fit  of  the  solution  to  a  smoothness 
constraint  was  determined  by  a  single  smoothness  tradeoff  parameter.  Shiller  did  not  offer  an 
objective  method  of  choosing  the  smoothness  tradeoff  parameter.  Akaike,  (1980),  completed  the 
analysis  initiated  by  Shiller.  Akaike  developed  and  exploited  the  concept  of  the  likelihood  of  the 
Bayesian  model  and  used  a  maximisation  of  the  likelihood  procedure  for  determining  the  smooth¬ 
ness  tradeoff  parameter.  (In  Bayesian  terminology,  the  smoothness  tradeoff  parameter  is  referred 
to  as  the  "hyperparameter",  Lindley  and  Smith,  1972.)  The  origin  of  Shiller- Akaike  smoothness 
priors  can  be  seen  in  a  smoothing  problem  posed  by  Whittaker  (192S).  The  smoothing  problem 
context  is  now  understood  to  be  common  to  a  variety  of  other  statistical  data  analysis  problems 
including  density  estimation  and  image  analysis  (Titterington  1985). 

In  the  problem  treated  by  Whittaker,  the  observations  J/n,n  =  l,...,jV  are  given.  They  are 
assumed  to  consist  of  the  sum  of  a  "smooth"  function  and  observation  noise, 

Vn  =  fn  +  (M) 

The  problem  is  to  estimate  the  unknown  fn,n  =  l,...,N.  In  a  time  series  interpretation  of  this 
problem,  =  is  the  trend  of  a  nonstationary  mean  time  series.  A  typical  approach  to 

this  problem  is  to  use  a  class  of  parametric  models.  The  quality  of  the  analysis  is  completely 
dependent  upon  the  appropriateness  of  the  assumed  model  class.  A  flexible  model  is  desirable.  In 
this  context,  Whittaker  suggested  that  the  solution  balance  a  tradeoff  of  goodness  of  fit  to  the  data 
and  goodness  of  fit  to  a  smoothness  criterion.  This  idea  was  realised  by  minimizing 
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for  some  appropriately  chosen  smoothness  tradeoff  parameter  fj}.  In  (1.2)  V*  fn  expresses  a  kth- 
order  difference  constraint  on  the  solution  f,  with  V/„  =  /„  —  fn-\,  V/2  =  V(V/n)>  etc- 
(Whittaker’s  original  solution  was  not  expressed  in  a  Bayesian  context.  Whittaker  and  Robinson 
(1924)  does  invoke  a  Bayesian  interpretation  of  this  problem.) 

The  properties  of  the  solution  to  the  problem  in  (1.1)-(1.2)  are  clear.  If  /i2=0,  fn  =  yn 

2 

and  the  solution  is  a  replica  of  the  observations.  As  ft  becomes  increasingly  large,  the  smoothness 

2 

constraint  dominates  the  solution  and  the  solution  satisifies  a  kth  order  constraint.  For  large  ft 

and  k—  1,  the  solution  is  a  constant,  for  k= 2,  it  is  a  straight  line  etc..  Whittaker  left  the  choice 
2 

of  ft  to  the  investigator. 

Kohn  and  A  ns  ley  (1987)  demonstrate  that  the  signal  extraction  problem  of  (1.1)  and  the 
smoothing  problem  of  (1.2)  are  equivalent  problem  statements.  The  equivalence  also  holds  for 
broad  variations  of  signal  extraction  and  smoothing  problems.  All  of  the  time  series  analysis  prob¬ 
lems  that  we  treat  here  are  variations  of  the  signal  extraction/smoothing  problem  in  (1.1)  and 
(1.2). 

An  implication  of  Akaike  (1980)  is  that  a  Bayesian  interpretation  of  the  smoothing  problem 
in  (1.2)  implies  that  the  difference  equation  constraint  is  a  stochastically  perturbed  sero-mean  unk¬ 
nown  variance  difference  equation.  The  stochastically  perturbed  difference  equation  constraint  in 
the  trend  estimation  problem  is  a  smoothness  priors  constraint  in  the  time  domain.  Akaike  (1980), 
considered  other  time  domain  smoothness  priors  constraint  problems  including  the  Shiller  distri¬ 
buted  lag  problem  and  the  seasonal  adjustment  of  time  series.  Ishiguro  et  al.  (1981)  used  time 
domain  smoothness  priors  constraints  and  fixed  effects  regression  in  an  analysis  of  tidal  effects. 
Akaike  (1979)  employed  a  frequency  domain  smoothness  priors  constraint  on  the  distributed  lag 
parameters  in  the  Shiller  problem.  Gerach  and  Kitagawa  (1984)  and  Kitagawa  and  Gersch  (1985a) 
are  other  frequency  domain  smoothness  priors  time  series  problem  analyses. 


Shiller  (1973),  Akaike  (1980),  and  all  of  the  aforementioned  smoothness  priors  analyses,  are 
Bayesian  analyses  of  the  linear  model  with  Gaussian  stochastic  constraints  and  Gaussian  distur¬ 
bances.  The  critical  ideas  in  smoothness  priors  are  the  likelihood  of  the  Bayesian  model  and  the 
use  of  likelihood  as  a  measure  of  the  goodness  of  fit  of  the  model.  In  our  analysis,  hyperparameters 
have  interpretations  as  noise  to  signal  ratios  and  they  have  a  remarkable  role  in  the  analysis.  The 
maximisation  of  the  likelihood  of  a  small  number  of  hyperparameters  permits  the  robust  modeling 
of  a  time  series  with  relatively  complex  structure  and  a  very  large  number  of  implicitly  inferred 
parameters.  When  we  consider  alterntative  smoothness  priors  models,  with  different  distributional 
assumptions  or  different  numbers  of  parameters  to  model  the  same  data,  we  use  the  Akaike  AIC 
statistic  (Akaike  1973),  to  choose  between  candidate  models.  Kitagawa  (1987)  is  a  smoothness  pri¬ 
ors  state  space  modeling  of  nonstationary  time  series  in  which  neither  the  system  noise  or  the 
observation  noise  are  necessarily  Gaussian  distributed. 

The  original  Whittaker  problem  has  also  given  rise  to  work  on  splines  in  numerical  analysis 
and  to  related  smoothing  problem  analysis,  particularly  by  Wahba  (1977), (1982).  Smoothness  pri¬ 
ors  relates  to  the  ill-posed  problems  and  problems  of  statistical  regularisation  that  have  been  con¬ 
sidered  extensively  in  the  Soviet  Union  by  Tikhonov  (1965)  and  his  sssociates.  Also  related  are 
the  "bump  hunting  "-penalised  likelihood  methods,  Good  and  Gaskins  (1980)  and  Wecker  and  Ans- 
ley  (1983)  and  O’Sullivan  et  al.  (1986).  Vigorous  work,  primarily  at  the  Institute  of  Statistical 
Mathematics,  Tokyo,  has  resulted  in  the  application  of  smoothness  priors  methods  to  a  variety  of 
applications,  other  than  the  ones  we  discuss  here.  These  applications  include  the  seasonal  adjust¬ 
ment  of  time  series,  (Akaike  1980b),  tidal  analysis  (Ishiguro  et  al.  1981),  binary  regression  (Ishi- 
guro  and  Sakamoto  1983),  cohort  analysis  (Nakamura  1986),  and  density  estimation  (Tanabe  and 
Sagae  1987). 

Smoothness  priors  problems  that  are  amenable  to  analysis  by  least  squares  algorithms  are 
treated  in  Section  2.  The  likelihood  of  the  Bayesian  model,  as  done  by  Akaike,  is  in  Section  2.1. 
The  Bayesian  solution  to  the  smoothing  problem  originally  posed  by  Whittaker  is  also  shown 
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there.  In  that  problem,  the  smoothness  priors  constraints  are  time  domain  constraints.  The  priors 
are  expressed  as  sero-mean  unknown  variance  stochastically  perturbed  kth-order  random  walk 
difference  equations.  In  Section  2.2,  the  estimation  of  the  power  spectral  density  from  short  dura* 
tion  stationary  time  series  illustrates  the  use  of  frequency  domain  smoothness  priors  constraints. 
In  Section  3  several  smoothness  priors  nonstationary  time  series  problems,  which  are  amenable  to  a 
Kalman  filter  state  space  method  of  analysis,  including  examples  of  the  modeling  of  nonstationary 
mean  and  nonstationary  covariance  time  series,  are  shown.  All  of  the  aforementioned  treat  a 
linear  model,  Gaussians  distributions  situation.  That  method  is  generalized  in  Section  4.  There 
we  show  a  smoothness  priors  state  space  not  necessarily  Gaussian  nonstationary  time  series 
analysis  method.  Finally,  Section  5  is  a  summary. 


2  SMOOTHNESS  PRIORS  MODELING:  LEAST  SQUARES  ANALYSIS 


In  Section(2.1)  we  review  the  concept  of  smoothness  prion  B&yesi&n  modeling  &s  introduced 
in  Ak&ike  (1980).  Th&t  method  is  applied  to  the  problem  addressed  by  Whittaker  (1923),  the  esti¬ 
mation  of  a  trend  in  white  noise.  The  smoothness  prion  constraint  is  expressed  as  a  kth  order  ran¬ 
dom  walk  with  a  sero-mean,  unknown  variance  perturbation.  The  variance  is  a  hyperparameter  of 
the  prior  distribution.  The  constraint  is  a  time  domain  constraint  on  the  prion.  In  Section  2.2  we 
introduce  the  notion  of  frequency  domain  constraint  on  the  prion.  That  method  is  used  in  the 
estimation  of  the  power  spectral  density  of  a  stationary  time  series.  Section  2.3  is  a  discussion. 

The  frequency  domain  smoothness  priors  method  is  particularly  suited  for  the  situation  in 
which  only  a  short  span  of  data  is  available  for  analysis.  In  that  case,  the  results  of  conventional 
parametric  model  analysis  methods  are  particularly  sensitive  to  the  choice  of  model  order.  We  cir¬ 
cumvent  that  problem,  using  the  frequency  domain  smoothness  prion,  by  tending  to  fit  models 
that  are  "too  long".  Those  prion  reflect  the  integrated  squared  kth  derivative  with  respect  to  fre¬ 
quency  of  the  departure  from  model  smoothness.  The  estimation  of  the  model  parameters  and  an 
additional  small  number  of  hyperparameten  is  required.  The  maximisation  of  the  likelihood  of 
the  hyperparameten  is  the  critical  computation. 

2.1  SMOOTHNESS  PRIORS  BAYESIAN  MODELING 

Consider  the  linear  regression  model  subject  to  Bayesian-stochastic  constraints 


The  dimensions  of  the  matrices  in  (2.1.1)  are  y:  nxl;  X.  nxp;  0:  pxl.  c3  and  X3  are  unknown,  y  is 
the  vector  of  observed  data,  X  and  D  are  assumed  known.  0  is  the  normally  distributed  prior 
parameter  vector.  The  observation  noise  variance  is  o3.  In  this  conjugate  family  Bayesian  situa¬ 
tion  (Berger  1985),  the  mean  of  the  posterior  normal  distribution  of  the  parameter  vector  6  minim- 


(2.1.2) 
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If  A1  were  known,  the  confutation*]  problem  in  (2.1.2)  could  be  solved  by  an  ordinary  least 


squares  computation.  The  solution  for  0,  the  posterior  mean,  is  the  minimiser  of 


(2.1.3) 


That  solution  is 


i  «  [xTX  +  A,DrZ?]  ^ XTy 


(2.1.4) 


with  the  residual  sum  of  squares, 


sse($,a3)~vti  -  iT\xTx  +  A2Z>rZ>]  V 


(2.1.5) 


For  a  smoothness  priors  interpretation  of  the  problem  in  (2.1.1)  and  (2.1.2),  multiply  (2.1.2) 


by  -1/2<t3  and  exponentiate.  Then  the  0  that  minimises  (2.1.2)  also  maximizes 


~pl -h  l*,-^l  exp \~£  I® 


(2.1.6) 


In  (2.1.6),  the  posterior  distribution  interpretation  of  the  parameter  vector  0  is  that  it  is  propor¬ 


tional  to  the  product  of  the  conditional  data  distribution  (likelihood),  p[y\  X,0,o*),  and  a  prior 


distribution,  »(0|  A i,ai)  on  0, 


»(*l  MV*)  a  p(y|  Jr,tf,«r*)w(^|  A t,o3)  . 


(2.1.7) 


The  integration  of  (2.1.7)  yields  L(A,,o3),  the  likelihood  for  the  unknown  parameters  A2  and  a3, 


L(AV)  -  f"  *(f|  y,A \o')dO  . 

J  “00 


(2.1.8) 


I.J.  Good  (1965)  referred  to  the  maximisation  of  (2.1.8)  as  a  Type  II  maximum  likelihood  method. 


Since  x(9|  y.A2,^2)  is  normally  distributed,  (2.1.8)  can  be  expressed  in  the  closed  form,  (Akaike 


1980), 


(2.1.9) 

The  maximum  likelihood  estimator  of  a1  is 

a1  -  SSE(l\*)/N  .  (2.1.10) 

It  is  convenient  to  work  with  -2  log  likelihood.  Using  (2.1.10)  in  (2.1.9)  yields 

(2.1.11) 

-2/o*L(AV*)  =  Nlog2*  +  NlogSSE^P,X7)/N)  +  log|  XTX  +  )?DTD\  -  log|  X7DTD\  +  N. 

A  practical  way  to  determine  the  value  of  A3  for  which  the  -21og-likelihood  is  minimised,  is  to  com¬ 
pute  the  likelihood  for  discrete  values  of  A2  and  search  the  discrete  -21og  likelihood-hyperparameter 
space  for  the  minimum.  Akaike  (1980)  is  very  likely  the  first  practical  use  of  the  likelihood  of  the 
Bayesian  model  and  the  use  of  the  likelihood  of  the  hyperparameters,  as  a  measure  of  the  goodness 
of  fit  of  a  model  to  data. 

Estimating  a  Trend 

Here  we  return  to  the  original  problem  posed  in  the  Introduction.  We  use  the  notation  /„«  t„ 
where  t„  is  the  trend  at  time  n  to  emphasise  the  fact  that  we  are  estimating  the  mean  of  a 
nonstationary  mean  time  series.  A  critically  important  observation  is  that  from  the  stochastic 
regression  or  Bayesian  point  of  view,  the  difference  equation  constraints  in  the  Whittaker  problem 
are  stochastic.  That  is,  V**»  “  wn,  with  w„  assumed  to  be  a  normally  distributed  sero-mean 
sequence  with  unknown  variance  r2  .  For  example  for  1-1  and  t-2  those  constraints  are: 

-  ‘.-1  +  •.!  (2  1  12) 
“  2t„_,  —  I,.j  +  w,,. 

A  parameterisation  which  relates  the  trend  estimation  problem  to  the  earlier  development  in  this 
section  is  i2  =  o2/ A2.  Corresponding  to  the  matrix  D  in  (2.1.2),  for  i- 1  and  2,  the  smoothness 
constraints  can  be  expressed  in  terms  of  the  following  NxN  constraint  matrices: 


L( AV)  -  (2*t2)-*/2|  A2DrD|  ‘/2I  XTX  +  A2DrD|  -,/,exp|-~55£(^,A,)| 


T 


a 

a 

-1  1 

-a  / 

-1  1 

0 

.  D,  » 

1  -2  1 

1-2  1  0 

-1  1 

1-2  1 

In  (2.1.13)  a  and  $  are  small  numbers  that  are  chosen  to  satisfy  initial  conditions. 


I 


(2.1.13) 


For  fixed  k  and  fixed  A1  the  least  squares  solution  can  be  simply  expressed  in  the  form  of 

(2.1.13).  For  example  with  k  -  2,  the  solution  {<atn  -  1,...,JV)  satisfies 


(2.1.14) 


Note  that  the  problem  in  (2.1.14)  is  a  version  of  the  Bayesian  linear  stochastaic  regression  problem 
in  (2.1.3)  with  9  «  t  -(t„...,t#) r,  X  •  I,  the  NxN  identity  matrix,  and  I)  -  Dt  or  Dt  .  From 
(2.1.3), the  solution  to  (2.1.14),  with  D-D3,  is 

f  -  |/  +  X*DfD3\-ly  (2.1.15) 


and  the  value  of  SSE(&, A1)  is  given  by  (2.1.5)  with  $—t,X—I,D—D3.  The  smoothing  problem 
expression  of  (2.2.15)  is  that  the  solution  vector  is:  f~(ti|yi...,tjy)  /f)T-  The  least  squares  problem  in 

(2.1.14),  with  D-Dj,  is  solved  for  discrete  values  of  A*D  and  the  -2  log  likelihood-hyperparameter 
space  is  searched  for  a  minimum.  From  (2.1.11),  the  minimised  value  of  -21og  likelihood  for  this 
problem  is: 

(2.1.16) 

-2/o»X,(Jt*,^3)  -  Nlogl*  +  NlogSSE{{?, \*)/N))  +  log]  A ,D^Di  +  /|  -  log]  A7D,rI>,|  +  1 V. 


The  numerical  values  of  SSE(  f*,A* )  and  of  the  determinants  in  (2.1.16)  are  transparent  in  a  least 
squares  algorithm  analysis  of  (2.1.14).  Since  A  -  er/r,  A1  has  a  noise-to-signal  ratio  interpretation. 
Smaller  A  corresponds  to  smoother  trends. 
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An  Example  of  Trend  Estimation 


We  consider  the  example  of  an  asymetrically  truncated  normal  density-like  function  in  the 
presence  of  additive  noise,  JV(0,<r*)  .  Figure  la  shows  the  smooth  function  t„,n-l,...,Af  and  the 
superposition  of  t„  and  the  additive  noise.  The  problem  is:  Given  the  noisy  observations  j„,  esti¬ 
mate  the  unknown  smooth  function  that  is  in  the  noise,  i.e.  specify  f,i  *,n  =  1,...,JV.  We  solved  the 
least  squares  computational  problem  in  (2.1.14)  using  the  Householder  transformation  method.  -2 
log  likelihood  of  the  hyperparameter  model  is  computed  from  (2.1  16) 

The  critical  role  of  the  hyperparameter  is  transparent  in  this  example  Figures  lb.c.d  show 
the  estimated  trend  for  values  of  the  hyperparameters  that  are  too  small.  ( Jt*  »  0  00001)  and  too 
large  (A  -  0.1)  as  well  as  the  hyperparmeter  for  which  -2k>g  likelihood  is  minimised 
(A  »  G. 00136).  As  anticipated,  with  the  hyperparameter  defined  as  indicated  above,  the  estimated 
trend  for  a  too  large  value  of  the  hyperparameter  is  too  bumpy  and  the  estimated  trend  for  a  too 
small  value  of  the  hyperparameter  is  too  smooth. 

It  is  important  to  note  that  in  this  example,  although  the  truncated  Gaussian  satisfies 
V*logl,  ”=  0,  we  estimate  the  trend  with  the  "incorrect"  model  the  stochastic  ally  per¬ 

turbed  second  order  difference  equation.  The  point  is  that  a  prion  we  do  not  know  a  correct 
expression  for  the  underlying  smooth  function  in  (1.1).  Different  hyperparameter  values  result  in 
solutions  of  the  stochastically  perturbed  second  order  difference  equation  with  very  different 
smoothness  properties.  The  best  of  those  solutions  yields  a  very  good  approximation  to  the  ongi- 
nal  unknown  smooth  function.  This  key  observation  was  referred  to  by  Shiller,  (1973).  as  the 
"flexible  ruler  approach". 

2.2  SMOOTHNESS  PRIORS  IN  THE  FREQUENCY  DOMAIN 

The  smoothness  priors  in  the  estimation  of  the  mean  value  of  a  nonstationary  time  senes  was 
expressed  as  a  time  domain,  stochastically  perturbed  difference  equation  constraint  on  the  evolu¬ 
tion  of  the  trend.  Smoothness  priors  contraints  can  also  be  expressed  in  the  frequency  domain  In 
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this  section,  we  illustrate  the  use  of  frequency  domain  prion  for  the  estimation  of  the  power  spec* 
tral  density  of  a  stationary  time  series. 

A  Long  AR  Model  For  Spectral  Estimation 

A  smoothness  priors- long  autoregressive  (AR)  model  approach  is  used  here  for  spectral  den¬ 
sity  estimation. 

The  classical  windowed  periodogram  method  of  spectral  estimation  is  satisfactory  for  spectral 
analysis  when  the  data  set  is  "long".  The  alternative  of  spectral  estimation  via  the  fitting  of 
parametric  models  to  moderate  length  data  spans  became  popular  in  the  last  decade,  Kesler(1986). 
When  the  data  span  is  relatively  short,  three  facts  render  parametric  modeling  methods  of  spectral 
estimation  statistically  unreliable.  One  is  the  instability  or  small  sample  variability  of  whatever 
statistic  is  used  for  determining  the  best  order  of  parameteric  model  fitted  to  the  data.  The  second 
is  that  usually  the  "parsimonious"  parametric  model  is  not  a  very  good  replica  of  the  system  that 
generated  the  data.  The  third  is  that  the  spectral  density  of  the  fitted  parametric  model  can  not 
possibly  be  correct.  Independent  of  which  parametric  model  order  is  selected,  there  is  information 
in  the  data  to  select  models  of  different  orders.  A  Bayesian  estimate  of  power  spectral  density 
requires  that  the  spectral  density  of  parametric  models  of  different  model  orders  be  weighted  in 
accordance  with  the  likelihood  and  the  prior  of  the  model  order  of  different  models. 

The  smoothness  priors  AR  model  of  spectral  estimation  alleviates  this  problem.  A  particular 
class  of  frequency  domain  smoothness  priors  is  assumed  for  the  coefficients  of  AR  model  order  M, 
with  M  relatively  large.  The  likelihood  of  the  hyperparameters  that  characterise  the  class  of 
smoothness  priors  is  maximised  to  yield  the  best  AR  model  of  order  M  with  the  best  data  depen¬ 
dent  priors.  (A  more  complete  treatment  of  the  modeling  discussed  here  is  in  Kitagawa  and 
Gersch,  1985a.) 


The  Smoothness  Priors  Long  AR  Model 


Consider  the  sutoregressive  model  of  order  M , 

u 

*■  “  £  amK-m  +  *•  (2-2.1) 

■•1 

In  (2.2.1)  {«„}  is  s  Gaussian  white  noise  with  mean  sero  and  variance  A  least  sqnares  fit  of  the 
AR  model  to  the  data,  h,—,Pn,  with  the  first  M  observations  li-ifih-Mr-ilfo  treated  as  given 
constants,  leads  to  the  minimisation  of 

N  U 

Elf.  -  Ea»f—  I*  (2.2.2) 

•  •1  *•! 

If  M  is  comparable  to  N,  the  resnlt  of  the  least  squares  computation  can  be  meaningless.  The 
smoothness  priors  solution  mitigates  this  difficulty  by  considering  the  solution  of  the  constrained 
least  squares  problem.  We  consider  a  frequency  domain  smoothness  priors  constraint  on  the  distri¬ 
bution  of  the  AR  model  parameters.  The  frequency  response  function  of  the  whitening  filter  of  the 
AR  process  given  by 

M 

Mf)  -  1  -  E  *-e*P  |-2*»'*n/).  (2.2.3) 

»«i 

Let  a  measure  of  the  smoothness  of  the  frequency  response  function  be 

R“  *  /-,/,!  -*^1  (2-2.4) 

From  the  definition  in  (2.2.4),  a  large  value  of  Rt  means  an  unsmooth  (in  the  sense  of  differential) 
frequency  response  function.  We  also  use  the  sero  derivative  smoothness  constraint, 

-  f'j  Mf)\  *df  -  1  +  E  ‘i  (2.2.5) 

as  &  penalty  to  the  whitening  filter. 

With  these  constraints,  and  with  A3  and  »/*  fixed,  the  AR  model  coefficients  {aw,m»l,...,A/}, 
minimise 


(2.2.6) 


El». -  E  «•».—!*  +  **E  mU*l  ♦  ^E  «i- 

•■1  a-1 

In  (2.2.6),  A*  tad  v*  are  the  hyperparameters.  By  a  proper  choice  of  these  parameters,  our  esti¬ 
mates  of  the  AR  model  coefficients  balance  the  tradeoff  between  the  infidelity  of  the  model  to  the 
data  and  the  infidelity  of  the  model  to  the  frequency  domain  smoothness  constraints.  For  com¬ 
pleteness,  to  within  a  constant,  the  Gaussian  priors  on  the  AR  model  coefficients  corresponding  to 
the  R0  and  Rk  constraints  are 

«*P — E  mua*txp~ £<r*  E  (2.2.7) 


Following  our  earlier  discussion,  define  the  matrices  D  and  a  and  the  matrices  X  and 


V  by 
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.  »  * 
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**  i 

••■tun 

(2.2.8) 


Then  the  AR  model  coefficients  satisfy 

d  -  (XTX  +  DTD)-'XTy,  (2.2.9) 

and  the  residual  sum  of  squares  is 

5( AV)  -  yTy  -  iT{XTX  +  DtD)&.  (2.2.10) 

The  likelihood  of  the  hyperparamter  model  is 

X.(AV,<r»)  -  DTD\  *'*|  XTX  +  DTD\  ->/»«p|^.5(A*,j^)J  .  (2.2.11) 

Given  A*  and  i /*,  the  maximum  likelihood  estimate  of  o2  is,  The  ML  estimates  of 

A*  and  t?  are  obtained  by  minimizing 

-2logL(g\  A2, s’2,*2)  -Motfwo3  +  log|  DT D\  -  log\  XTX  +  DTD\  +N.  (2.2.12) 


with  respect  to  A1  and  s’2.  Computation  of  the  likelihood  over  a  discrete  A.A5,!/2  parameter  grid 


and  Marching  over  the  resulting  discrete  likelihood-hyperparameter  space  for  the  minimum  of  -2 
log  likelihood  yields  the  desired  smoothness  priors  long  AR  model. 

The  frequency  domain  smoothness  priors  constraint  used  here  has  an  interpretation  as  a  con¬ 
straint  on  the  smoothness  of  the  whitening  filter  of  the  AR  model.  (The  Oth  derivative  has  an 
energy  constraint  interpretation.)  An  important  facet  of  our  computations  is  that  they  are  compu¬ 
tationally  tractable.  That  allows  us  to  remain  within  the  framework  of  the  general  linear  model. 

An  Example,  Analysis  of  Canadian  Lynx  Data 

The  data  example  discussed  here  is  the  analysis  of  the  Canadian  Lynx  data  (n  =  114).  Other 
examples  are  shown  in  Kitagawa  and  Gersch  (1985a). 

The  AR  model  order  was  set  to  20  and  up  to  the  fourth  order  smoothness  prior  constraint 
was  tried.  The  hyperparameters  A  and  v  were  searched  over  the  discrete  values 
A  *  2>-“o0iy=lt...,10  where  is  the  sample  variance  of  the  data  and  u  =  tV0,  »=0,1,...,4,  for 
each  value  of  the  order  of  the  smoothness  prior  constraint. 

The  overall  best  model  was  1=  1,1/  ”  0,A  =  0.17S.  The  Bayesian  estimate  of  the  spectrum  is 
shown  in  Figure  2a.  For  comparison,  AR  models  of  order  up  to  20  were  fitted  by  the  least  squares 
method.  The  AIC  best  order  was  11.  The  A1C  criterion- AR  modeled  spectrum  is  shown  in  Figure 
2b.  In  the  Bayesian  model  spectral  estimate,  the  peaks  at  the  high  frequencies  are  significantly 
reduced  compared  to  the  AR  model  model  spectrum  estimate,  while  the  ones  in  the  lower  frequen¬ 
cies  remain  unchanged.  Figure  2c  shows  the  superimposed  estimated  spectra  obtained  from  AR 
models  with  different  model  orders.  From  Figures  2b, 2c  we  see  that  the  shape  of  the  two  right¬ 
most  peaks  of  Figure  2.1B  vary  considerably  with  model  order.  Thus  they  are  not  estimated  reli¬ 
ably  by  fixed  order  models.  That  is  typical  of  the  problem  of  estimating  spectral  density  by  fixed 
order  parametric  models.  If  the  model  fitted  to  the  data  is  not  in  the  class  of  models  which  gen¬ 
erated  the  data,  the  model  fitting  is  only  approximate.  The  selection  of  the  best  non  Bayesian 
parametric  model  ignores  the  evidence,  in  the  Bayesian  sense,  for  other  parametric  models  when  in 
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fact  it  should  be  taken  into  account.  The  suppression  of  those  peaks  by  the  smoothness  priors-long 
AR  model  method,  shown  in  Figure  2a,  therefore  seems  quite  reasonable. 

2.S  DISCUSSION 

The  variation  of  the  behavior  of  the  solution,  from  very  rough  to  very  smooth,  in  the  trend 
estimation  problem  under  the  smoothness  prion  constraints  for  different  values  of  the  hyperparam¬ 
eters,  is  characteristic  of  the  profound  effect  of  the  hyperparameter.  The  log  likelihood  of  the 
hyperparameter  venus  the  hyperparameter  changes  gradually  in  the  vicinity  of  the  maximum  log 
likelihood.  That  fact  permits  a  discrete  likelihood-hyperparameter  search  procedure  to  be  used  in 
conjunction  with  a  Householder  transformation  algorithm  to  realise  a  reasonable  computational 
procedure. 

The  stochastically  perturbed  difference  equation  constraint  in  the  trend  estimation  problem 
is  a  time  domain  smoothness  prion  constraint.  Akaike  (1980),  considered  other  time  domain 
smoothness  constraint  problems  including  the  Shiller  distributed  lag  problem  and  the  seasonal 
adjustment  of  time  series.  Ishiguro  et  al.  (1981),  used  time  domain  smoothness  constraints  and 
fixed  effects  regression  in  the  analysis  of  earth  tide  data.  The  Householder  transformation  least 
squares  algorithm  FORTRAN  programs  BAYSEA  and  BAYTAP-G  in  TIMSAC-84  (Akaike  et  al. 
1985),  are  suitable  for  seasonal  adjustment  and  tidal  analyses  respectively. 

Akaike  (1979)  illustrated  a  frequency  domain  smoothness  prior  for  the  solution  of  the  Shiller 
(1973)  impulse  response  estimation  problem.  We  used  frequency  domain  smoothness  priors  here 
for  spectral  density  estimation.  Gersch  and  Kitagawa  (1984)  is  an  application  of  the  frequency 
domain-smooothness  priors  method  to  transfer  function  estimation.  Our  smoothness  priors 
method  is  particularly  suited  for  the  situation  in  which  only  a  short  span  of  data  is  available  for 
analysis.  In  that  situation,  the  results  of  conventional  parametric  model  analysis  methods  are  par¬ 
ticularly  sensitive  to  the  choice  of  model  order.  We  circumvent  that  problem,  with  the  Bayesian 
smoothness  priors  method,  by  tending  to  fit  models  that  are  "too  long".  The  model  parameters 


are  apecified  aa  the  solution  to  a  constrained  least  squares  problem  in  which  the  constraints  are 
expressed  in  the  frequency  domain.  The  likelihood  of  the  hyperparameters  is  readily  computable 
in  a  least  squares  framework  with  the  frequency  domain  priors. 

The  goodness  of  the  choice  of  the  fequency  domain  smoothness  priors  can  be  appraised  by 
evaluating  its  performance  in  various  conceivable  situations.  The  smoothness  priors-long  AR 
model  gives  reasonable  results  in  the  analysis  of  the  real  Lynx  data  in  comparison  with  the 
minimum  AIC-AR  model  method.  Kitagawa  and  Gersch  (1985a)  show  smoothness  priors-  long 
AR  model  results  that  were  superior  to  the  minimum  AIC-AR  model  method  in  a  simulated-two 
sine  waves  in  noise  case,  when  the  data  actually  corresponds  to  an  ARMA  model.  This  flexibility 
of  performance  is  what  is  desired  from  a  Bayesian  model. 

Also  in  Kitagawa  and  Gersch  (1985a),  a  Monte  Carlo  study  of  the  expected  entropy  experi¬ 
ment  was  done  to  appraise  the  performance  of  the  smoothness  priors-long  AR  model  for  spectral 
estimation  against  performance  of  parametric  AR  models  whose  order  was  determined  by  Akaike’t 
AIC  criterion.  The  smoothness  priors-long  AR  model  method  was  superior  to  the  minimum  AIC- 
AR  model  method  in  the  two  simulation  model  cases  studied.  In  one  case,  the  simulation  model 
was  in  the  AR  model  class.  In  the  other  case,  the  simulation  model  was  not  in  the  AR  model 
class.  Thus,  the  example  shown  here  and  the  Monte  Carlo  study  reported  in  Kitagwa  and  Gersch 
(1985a)  are  evidence  to  support  the  soundness  of  our  empirical  frequency  domain  smoothness  pri¬ 
ors  approach. 


S  STATE  SPACE  GAUSSIAN  SMOOTHNESS  PRIORS  MODELING 

A  state  apace  modeling  approach  for  the  linear  model  with  Gaussian  system  and  observation 
noise  that  is  the  equivalent  of  the  least  squares  computational  approach  to  smoothness  priors 
modeling,  was  shown  in  Brotherton  and  Gersch  (1981)  and  Kitagawa  (1981).  The  state  space 
smoothness  priors  modeling  method  was  applied  to  the  modeling  of  nonstationary  mean  and  nons¬ 
tationary  covariance  time  series,  Gersch  and  Kitagawa  (1983a,  1985)  and  Kitagawa  and  Gersch 
(1984,1985b). 

In  the  modeling  of  nonstationary  time  series  discussed  below,  there  tends  to  be  more  parame¬ 
ters  than  data.  In  that  case,  attempts  to  fit  the  parameters  by  least  squares  or  any  other  ordinary 
means  will  yield  poor  parameter  estimates.  The  smoothness  priors  permit  the  model  parameters  to 
be  expressed  implicitly  as  the  solution  of  sero-mean  unknown  variance  stochastically  perturbed 
difference  equations.  The  variances  are  hyperparameters  of  the  prior  distribution  of  the  parameters. 
One  interpretation  of  the  role  of  the  smoothness  priors  is  that  they  permit  a  realisation  of  a  com¬ 
putational  procedure  to  estimate  the  model  parameters. 

In  this  section,  computational  procedures  for  the  modeling  of  nonstationary  mean  and  nons¬ 
tationary  covariance  time  series  are  discussed  that  are  variations  of  the  procedures  discussed  in  our 
previous  papers.  Examples  are  shown  in  Section  3.4.  A  discussion  of  other  problems  treated  by 
the  smoothness  priors-state  space- linear- Gaussian  model  and  comments  appear  in  Section  3.5. 
Kalman  filter,  prediction  and  smoothing  formulas  and  computation  of  the  likelihood  of  the  linear 
Gaussian  model  are  shown  in  Section  3.2. 

3.1  NONSTATIONARY  MEAN  SMOOTHNESS  PRIORS  STATE  SPACE  MODEL¬ 
ING 

Time  series  with  trend  and  seasonal  components  occur  for  example  in  meteorological,  oceano¬ 
graphic  and  econometric  studies.  Here  we  consider  a  complex  nonstationary  mean  time  series 
problem  motivated  by  economic  time  series  considerations.  The  economic  time  series  nonstation- 


ary  mean  can  be  decomposed  into  a  trend  i,,  a  globally  atationary  component  a  aeaaonal  com* 
ponent  a  trading  day  factor  dm  and  an  observation  noise  component  ta, 

+  d.  +  <a.  (8.1.1) 

Each  of  the  aforementioned  components  can  be  modeled  as  a  stochastically  perturbed  difference 
equation.  The  generic  state  space  model  for  this  decomposition  can  be  expressed  by 

*.  -  +  Gwm  (8.1.2) 

V.  -  Hazn  +  < . 

where  F,G  and  Hn  are  MxM,  MxL  and  1  xM  matrices  respectively.  w„  and  « „  are  each  assumed 
to  be  sero  mean  independent  normally  distributed  random  variables.  za  is  the  state  vector  at  time 
n  and  y„  is  the  observation  at  time  n.  For  any  particular  model  of  the  time  series,  the  matrices 
F,G  and  Ha  are  known  and  the  observations  are  generated  recursively  starting  from  an  initial 
state  that  is  assumed  to  be  normally  distributed  with  mean  z0  and  covariance  matrix  V0. 

The  state  space  model  that  includes  the  local  polynomial  trend,  stationary  AR  coefficient, 
trading  day  effects  and  observation  error  components  can  be  written  in  the  orthogonal  decomposi¬ 
tion  form 

F,  0  0  0  <7,  0  0  0 

0  F,  0  0  0  C,  0  0 

0  0  Ft  0  *■-»  +  0  0  <7,  0 

0  0  0  Ft  0  0  0  <?4 

».  -  [JffB  H ,  Ht  //.,.)*.  +  ea. 

The  component  modeb  (F;, <7, ■,//,)  in  order,  (/— 1,...,4)  represent  the  trend,  stationary  AR,  sea¬ 
sonal  and  trading  day  effects  components  respectively.  Some  of  the  particular  trend,  AR,  seasonal 
and  trading  day  difference  equation  constraints  that  we  have  employed  and  that  have  representa¬ 
tions  in  the  (Fl,Gi,Hi)  matrices  in  (3.1.3)  are  shown  below. 
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The  trend  component  ta  satisfies  a  kth  order  stochastically  perturbed  difference  equation 


V**.  “  «...  (31.4) 

where  Wj>a  is  an  i.i.d.  sequence  with  w1<a~JV(0,rf).  (See  (2.1.12).) 

The  stationary  AR  component  »,  is  assumed  to  satisfy  an  AR  model  of  order  p.  That  is 
given  by 

'•  ”  +  •  1  *  +  +  ®j,.  •  (3-1-5) 

In  (3.1.5)  w2%m  is  and  i.i.d.  sequence  with  wj^—JVfO.rJ).  The  seasonal  component  of  the  period  L 
difference  equation  is 

s.  -  -  «,_i  -  •  •  •  +  wi  u  .  (3.1.6) 

In  (3.1.6),  wIlt  is  an  i.i.d  sequence  with  «ulll~./V(01rJ). 

The  trading  day  effect  model  is 


+  •  •  +  (3-1.7) 

where  0,  m  denotes  the  trading-day  effect  factor  and  d,  .  corresponds  to  the  number  of  the  ith  day 

r 

of  the  week  at  time  n.  Implicit  in  (3.1.7)  is  the  constraint  '  0.  There  is  no  stochastic  com- 

•«i 

ponent  in  (3.1.7). 

For  a  general  model  including  local  polynomial  trend,  AR  component  trend,  local  seasonal 
component  and  trading  day  effect  components,  the  state  or  system  noise  vector  and  observation 
nosie  c,  are  assumed  to  i.i.d.  with  sero  mean  and  diagonal  covariance  matrix 


N 


0 

0 

0 

0 


rf  0  0  0 
0  rf  0  0 
0  0  ti  0 
0  0  0  a3 


(3.1.8) 


An  example  of  a  state  space  model  that  incorporates  each  of  the  components  with  trend  order  2, 
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AR  model  order  2  end  seasonal  component  with  period  L  is, 
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(3.1.9) 


The  smoothness  priors  problem  that  includes  all  of  the  components  in  the  decomposition 
identified  above  correponds  to  the  maximisation  of 


(3.1.10) 


rf  »  *-» 

■«]  i-O 


The  first  term  in  (3.1.10)  corresponds  to  the  conditional  data  distribution.  The  remaining  terms  in 

(3.1.10),  in  order,  corresspond  to  the  priors  on  the  trend,  the  globally  stochastic  component  and 
the  seasonal  component. 

The  role  of  the  hyperparameters  rf  and  rf  as  a  measure  of  the  uncertainty  in  the  belief  of  the 
priors  is  clear  from  (3.1.10).  Relatively  small  rf  (rf)  imply  relatively  wiggly  trend  (seasonal)  com* 
ponents.  Relatively  large  rf  (rf)  imply  relatively  smooth  trend  (seasonal)  components. 
Correspondingly,  the  ratio  of  r j/o3,  jm  1  or  3,  can  be  interpreted  as  signal-to-noise  ratios.  (The 


value  of  <r*  in  (3.1.10)  is  essentially  estimated  free  of  computational  cost  in  the  Kalman  filter  algo¬ 
rithm.) 


S.2  RECURSIVE  ESTIMATION  OF  STATE  AND  LIKELIHOOD  COMPUTA¬ 
TION 


Let  a  state  space  model  be  given  by 


*«  “  f'mtn-x  +  G«W, 
ft  = 


(3.2.1) 


where  tva~  N(0,Qn)  and  N(0,R%).  Given  the  observations  yl . yN  and  the  initial  conditions 

z0|o,  Void  tbe  one-step-ahead  predictor  and  the  filter  are  obtained  from  the  Kalman  filter  algo¬ 


rithm: 


Time  Update  (Prediction) 


+  G.Q.Gl 


(3.2.2) 


Observation  Update  (Filtering) 


*»i» m  ^»ift  ~ 

i^i. -i /-*.*.] *w 


(3.2.3) 


Using  these  estimatates,  the  smoothed  value  of  the  state  xa  given  the  entire  observation  set, 
ft,---,ftv,  >»  obtained  by  the  fixed  interval  smoothing  algorithm,  (Anderson  and  Moore  1979), 


A.  -  y.i.F.rVml,lm 

*»|W  “  *«|  ■  — 

V.IM-  V. |.  +  A, \vn+l]tl-  Va+I| a)Aar. 


(3.2.4) 


The  state  space  representation  and  the  Kalman  filter  yield  an  efficient  algorithm  for  the  likelihood 
of  a  time  series  model.  The  joint  distribution  of  ft,...,fty  is, 


v*  f 


(3.2.5) 
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with 


/(*. I  Vu-.Vn-i)  “  (2»».)-,/,exp|-^-(v>,  -  (3.2.6) 


Then,  the  log  likelihood,  /,  of  the  model  is  obtained  by 


I  -  -  - 
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Mog2»  +  £loge.  +  £  — *-(„.  - 
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(3.2.7) 


The  maximum  likelihood  estimate  of  the  model  parameters  are  obtained  by  maximising  (3.2.7) 
with  respect  to  those  parameters.  The  AIC  is  defined  by 


A/C  —  —  2 (maximum  log —likelihood)  +  2(number  of  parameter/ )  (3.2.8) 

Alternative  models  for  time  series  might  be  models  with  and  without  trading  day  effects  or  models 
with  and  without  AR  component  effects.  In  each  case,  when  we  consider  alternative  models,  the 
model  with  the  smallest  value  of  the  AIC  statistic  is  selected  as  the  AIC  best  model. 

In  fitting  a  stationary  model,  we  can  utilise  the  theoretical  mean  and  the  theoretical  covari¬ 
ance  of  the  state  vector  as  the  initial  values  z0|0  and  V0|0.  In  the  nonstationary  case  we  consider 
the  initial  vector,  z0|0,  as  an  unknown  parameter  and  estimate  it  by  using  the  entire  set  of  data. 
The  log  likelihood  obtained  by  estimating  the  initial  state  vector  is  a  natural  estimate  of  the 
expected  log  likelihood  of  the  predictive  distribution  (Akaike  1980b,  and  Gersch  and  Kitagawa 
1983a). 


3.3  NONSTATIONARY  COVARIANCE  MODELING 

Here, time  series  with  nonstationary  covariances  are  modeled  by  a  time  varying  autoregres¬ 
sive  (AR)  model  with  smoothness  constraints  on  the  AR  parameters.  Time  varying  AR  coefficient 
models  have  been  a  topic  of  research  for  some  time.  For  example,  see  Whittle  (1965),  Kozin 
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(1977)  and  Nicholls  and  Quinn  (1985)  and  the  extensive  references  therein,  particularly  to  the  use 
of  random  coefficients  in  econometric  modeling.  Earlier,  in  engineering  applications,  the  modeling 
of  nonstationary  covariance  time  series  was  done  by  fitting  locally  stationary  models,  and  by 
orthogonal  polynomial  expansions  of  AR  coefficient  models.  Astrom  and  Wittenmark  (1973, 
Theorem  5)  express  a  time  varying  AR  coefficient  model  that  includes  the  possibility  of  random 
AR  coefficients.  Bohlin  (1976)  is  an  early  application  of  the  analysis  of  time  series  models  with 
time-dependent  coefficients.  The  concept  of  the  likelihood  of  the  Bayesian  model  or  of  hyper¬ 
parameters  or  anything  related  to  a  smoothness  prior  do  not  appear  in  the  earlier  papers.  Those 
are  key  concepts  here.  Kitagawa  (1983)  is  a  precedent  to  the  material  discussed  in  this  section. 

The  problem  in  modeling  nonstationary  covariance  time  series  is  to  achieve  an  efficient 
parameterization  to  capture  the  local  and  global  statistical  relationships  in  the  time  series.  That 
objective  is  achieved  here  by  imposing  smoothness  priors  constraints  in  the  form  of  stochastically 
perturbed  difference  equations  on  the  evolution  of  the  AR  coefficients.  The  variances  of  the  white 
noise  stochastic  perturbations  are  the  hyperparameters  of  the  the  AR  coefficient  distribution.  The 
difference  equations  are  imbedded  into  a  state-space  representation.  A  relatively  large  AR  model 
order  is  chosen,  the  AR  coefficients  at  each  time  instant  are  also  smoothed  using  the  frequency 
domain  differential  contraints  on  AR  coefficients,  as  in  the  smoothness  priors-long  AR  model  for 
spectral  estimation,  Section  2.2.  For  each  order  of  the  differential  constraint,  'he  Kalman  filter 
yields  the  likelihood  of  the  hyperparameters.  The  smoothed  estimate  of  the  nonstationary  innova¬ 
tions  series  variance  is  also  computed.  That  is  used  in  the  computation  of  an  instantaneous  spec¬ 
tral  density  which  is  defined  in  terms  of  the  instantaneous  AR  model  coefficients  and  the  innova¬ 
tions  series  variance. 

The  Time  Varying  AR  Coefficient  Model 

A  time  varying  AR  coeficient  model  is  given  by 


(S.S.1) 


Vn  * 

1-1 

In  (S.S.l),  the  coefficients  o,  „  are  assumed  to  change  "gradually"  with  time  and  <a  is  assumed  to 
be  a  normally  distributed  white  noise  sequence  with  variance  a3.  Since  there  are  mx A  AR 
coefficients  in  the  model  in  (S.S.l),  an  attempt  to  fit  the  parameters  by  least  squares  or  any  other 
ordinary  means  to  the  N  observations  Vi>  ■  ■  •  ,Vn<  will  yield  poor  parameter  estimates.  We  con¬ 
sider  the  unknown  AR  coefficients  to  be  random  variables  and  impose  stochastic  constraints  on 
those  coefficients.  Those  constraints  define  a  Gaussian  smoothness  prior  distribution  on  the  time 
history  of  the  AR  coefficients  and  on  the  spectrum  at  each  time  instant.  A  simple  and  useful 
model  for  a  time  varying  AR  coefficient  model  is  obtained  by  the  stochastically  perturbed 
difference  equation  constraint  model 


V  *“!,•  ->m  . 


(3.3.2) 


For  convenience,  in  (3.3.2)  Si  m  is  assumed  to  be  a  sero-mean  Gaussian  white  noise  sequence  with 
variance  rf  independent  of «  and  n.  That  is,  rf  -  r3,  m. 

The  smoothness  priors  constraints  on  the  AR  coefficients  mitigate  the  problem  of  overpara- 
merixation  by  permitting  the  AR  coefficients  to  be  expressed  as  the  solution  of  the  constrained 
least  squares  problem 
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In  (3.3.3)  m  and  kx,k7  are  assumed  known  and  r^A2,*/2  are  the  tradeoff  parameters  which  balance 
the  infidelity  of  the  model  to  the  data  and  the  infidelity  of  the  model  to  the  smoothness  con¬ 
straints. 

Similar  to  the  analysis  in  Section  (2.1),  (3.3.3)  yields  a  Bayesian  interpretation  of  the  least 
squares  problem.  Multiply  (3.3.3)  by  -1/2 o3  and  exponentiate.  Then  ,  to  within  a  constant  term, 


(3.3.4) 
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expresses  the  product  of  the  prior  distribution  on  the  smoothness  of  the  spectrum  end  the  prior  dis¬ 
tribution  on  the  smoothenss  of  the  AE  parameters.  The  tradeoff  parameters  r>,A,,t'3  are  the  hyper¬ 
parameters  of  the  prior  distribution.  As  in  the  development  in  Section  (2.1),  the  product  of  the 
conditional  data  distribution,  (proportional  to  the  leftmost  term  in  (3.3.3)),  and  the  prior  distri¬ 
bution  in  (3.3.4)  yields  the  posterior  distribution  for  the  AR  parameters.  As  in  (2.1.8),  integration 
of  the  posterior  distribution  for  the  AR  parameters  yields  the  likelihood  for  the  smoothness  tra¬ 
deoff  parameters. 

State  Space  Time  Varying  AR  Coefficient  Model 

Define  the  kmx  1  state  vector  at  time  n  to  be  t,  =  (a,  ,,...,4,  ,,  .  .  . 

The  state  space  time  varying  AR  coefficient  model  is 

*»  -  Fx._,  +  Gu>.  (3.3.5) 

V.  =  +  ».■ 

In  (3.3.5),  Ha,  is  a  (m41)xim  observation  matrix,  u».  is  the  m  vector,  w.  =  (£,.,  .  .  .  ,6mH)T  and 
the  m  +  1  vector  «.=(<«,£,,) r,  is  defined  in  (3.3.7).  For  the  difference  equation  orders  k,  =  l  and 
i,®2,  the  matrices  F,G,and  H  are 

k=l:  F  -  (/.),  G  -  (/J,  Ha  -  HUm  «  (if.-,.  .  .  ,y._  J  (3.3.6) 

2/„  -/J  \lm 

k-2-.  F=  j  Q  ,  G  -  Q  ,  Hh  -  J/,,.  =  |/flt.  0  ...  0]. 

The  input  process  noise  w9,  the  observation  noise  c.  and  the  spectrum  smoothness  priors  noise 
form  the  2km  +  l  vector  (wJ,eu,(„)T  that  is  assumed  to  be  normally  distributed  and  independent 
with  time  with  the  mean  and  covariance  matrix, 

w.  o|  Q  o  o| 

<.  ~  N  0  ,  0  o3  0  (3.3.7) 

f,  o|  0  0  5/ 


In  (3.3.7),  the  mxm  diagonal  matrix  Q  has  the  the  element  1/r2  on  the  diagonal  and  for  k2=2,(in 


(3.3.3)),  the  mxm  diagonal  matrix  5  has  diagonal  elements  l/(A1+t/*,...,m*A1-«-t'*)>  (see  (2.2.6)). 

For  a  fixed  difference  order  t,,  the  beat  fit  of  the  state  apace  smoothness  priors  constraints- 
time  varying  AR  coefficient  model  to  the  data  yt,  .  .  .  ,  is  the  one  for  which  the  likelihood  of 
the  hyperparameters  t3, As,i/*,  are  maximised.  The  likelihood  is  computed  using  the  recursive  for¬ 
mulas  indicated  in  Section  3.2. 

The  Instantaneous  Variance  And  The  Instantaneous  Spectrum 

In  many  practical  data  analysis  situations,  the  relatively  fast  wiggles  of  a  nonstationary 
covariance  time  series  appears  to  be  modulated  by  a  relatively  slowly  changing  envelope  function. 
The  envelope  function  has  an  interpretation  as  a  change  of  scale  of  the  time  varying  AR  coefficient 
model  or  equivalently  as  the  smoothed  (trend)  value  of  the  instantaneous  variance,  (Section  3.1) 
A  key  idea  in  that  modeling  is  to  find  a  variance  stabilising  transformation  of  the  innovations  that 
yields  the  instantaneous  trend  in  an  additive  sero-mean  constant  variance  observation  noise, 
Wahba  (1980).  Let  e„,n= be  a  realisation  of  a  sero-mean  normally  distributed  white  noise 
with  unknown  variance  ojj.  Then,  if  «r|-=<r|B_l,  xi=|«!i»-i  -*-Sj»]/2,  is  an  independent  sequence  of 
chi-square  random  variables  with  two  degrees  of  freedom.  From  Wahba  (1980),  the  transforma¬ 
tion  t„=log(;ti)+7,  where  7=0.57721  is  the  Euler  constant,  leaves  the  independent  random  vari¬ 
able  tm  with  a  distribution  that  is  almost  normal  with  the  moments  l?|tlll]  =  /of<y;j1,  Vrsr[ta)|  =  **/6. 
That  idea  is  exploited  here. 

Consider  a  second  order  difference  equation  constraint  on  the  log  variance  defined  by 

(*  *•*) 

In  (3.3.8)  {«vm}  is  an  independent  sero-mean  normally  distributed  sequence  with  unknown  variance 
r3.  Define  a  state  vector  by  zm=  (t„  <»_|jr-  Then  in  state  space  form  the  constraint  model  in 
(3.3.8)  and  the  observation  model  are  given  by 
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Application  of  the  Kalman  filter, prediction  and  smoothing  algorithms  described  earlier  yield  the 


smooth  value  ta>  N,  the  logarithm  of  the  smoothed  estimate  of  the  changing  variance.  Our  estimate 


of  the  changing  variance  is  o\m ,H  =  *r“e*P(W  +  7)- 


Motivated  by  earlier  work  on  spectrum  estimation,  we  define  the  instantaneous  spectrum  of  a 


time  varying  coefficient  AR  process  by 


-l/2^/Cl/2. 
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(5.5.10) 


The  value  of  the  instantaneous  spectrum  is  obtained  by  substituting  the  smoothed  estimates  of  the 


time  varying  AR  coefficients  and  the  smoothed  estimate  of  the  innovations  variance  o*  into 


(3.3.10). 


3.4  EXAMPLES 


A  Nonstationary  Mean  Time  Series  Example 


The  RS WOMEN  series  of  the  Bureau  of  the  Census  data,  (Ze liner  1983),  is  analysed  here. 


The  series  consists  of  the  retail  sales  of  women’s  apparel,  reported  in  millions  of  dollars.  The  sales 


for  each  month  are  affected  by  the  number  of  times  each  day  of  the  week  occurs,  the  trading-day 


effect,  because  buying  behavior  differs  for  each  day  of  the  week.  (The  sales  are  also  affected  by 


holidays.)  We  are  interested  in  determining  whether  or  not  it  is  appropriate  to  include  a  trading 


day  effect  in  the  model  and  whether  or  not  the  globally  stochastic  AR  component  should  be 


included  in  the  model.  The  A1C  statistic  is  used  to  determine  the  best  of  the  alternative  models. 


The  computations  were  done  using  the  DECOMP. PORT  program  in  T1MSAC-84  (Akaike  et 


al.  1985).  The  model  was  fitted  to  the  data  ft, 24  data  points  were  witheld.  The  AIC’s  for 


AR  model  orders  p=0,l,2,3,  (as  in  (3.1.5)),  respectively  for  the  non  trading  day  effect  and  trading 
day  effect  models  are  (111. 51, 96. 96, 98. 80, 98. 65)  and  (88.07,68.34,67.23,68.91).  An  interpretation 
of  those  results  is  that  models  with  an  AR  component,  pit 0,  are  superior  to  models  without  AR 
components  both  with  and  without  trading  day  effects  and  that  the  AIC  best  model, (AIC=67. 23), 
is  the  trading  day  effect  model  with  AR  model  order  p=2.  Figures  3a-e  show  selected  compute* 
tional  responses  for  the  non-AR  component-trading  day  effect  model.  Figures  3f-j  show  selected 
computational  results  for  the  AR  component-trading  day  effects  model.  The  seasonal  components, 
residual  noise  and  trading  day  effects  and  seasonal  plus  trading  effects  are  quite  similar  in  appear¬ 
ance  for  both  models. 

Several  aspects  of  the  modeling  results  are  noteworthy.  The  trend  of  the  trend  plus  AR  com¬ 
ponent  model  is  smoother  than  the  trend-non  AR  component  model  and  the  trend  plus  AR  com¬ 
ponent  is  almost  indistinguishable  from  the  trend  in  the  non  AR  component  model.  Also,  the  sea¬ 
sonal  component  is  very  regular  whereas  the  seasonal  plus  trading  day  component  reveals  the 
expected  slight  irregularities. 

A  important  property  of  the  AIC  best  trend  plus  seasonal  plus  AR  component  model,  instead 
of  the  trend  plus  seasonal  model,  can  be  seen  in  the  (out-of-sample)  forecasts  for  these  models  as 
shown  in  Figures  3e  and  3j.  In  those  illustrations  we  show  the  true  series,  the  forecasted  series  and 
plus  and  minus  one  sigma  of  the  forecast  random  variable.  The  plus  or  minus  one  sigma  predic¬ 
tion  intervals  of  the  trend  plus  AR  plus  seasonal  plus  trading  day  components  model  is  much 
tighter  than  the  same  quantity  for  the  non  AR  component  model.  The  increase  in  prediction  vari¬ 
ance  per  step  in  increasing  horison  forecast,  grows  in  accordance  with  the  variance  component 
terms  in  the  matrix,  in  (3.1.8).  The  variance  of  the  (wiggly)  trend  term  in  the  non  AR  component 
model,  is  larger  than  the  sum  of  variance  terms  of  the  (smooth)  trend  and  AR  component  in  the 
AR  component  model.  That  larger  variance  is  reflected  into  the  larger  one  sigma  prediction  inter¬ 
val  of  the  non  AR  component  model. 


These  results  illustrate  the  flexibility  of  the  decomposition  of  the  nonstationary  mean  concept 
via  smoothness  priors  modeling  and  the  importance  of  the  role  of  the  AIC  in  selecting  the  best  of 
alternative  models. 

Time  Varying  AR  Coefficient  Modeling,  Nonstationary  Covariance  Time  Series 

The  computations  were  realised  using  the  TVCAR.FORT  program  in  TIMSAC-84  (Akaike 

et  al  1985).  Figure  4a  shows  a  seismic  data  event,  yt . y*.  The  stochastic  "background  noise",  P 

wave  (the  first  abrupt  change  in  the  signal)  and  the  S  wave  (the  second  abrupt  change  in  the  sig¬ 
nal),  are  clearly  discernable.  Figures  4bd,f  are  graphs  of  computational  results  from  the  time  vary¬ 
ing  AR  coefficient  model  described  in  Section  3.3.  Respectively  they  show  the 
l°g((ilim  +  v!m-i )<  m*l,...,JV/2  "unsmoothed  envelope"  data  and  the  superimposed  estimate  of 
the  envelope  (changing  variance),  the  evolution  of  the  instantaneous  power  spectral  density  and 
the  evolution  of  the  partial  correlation  coefficients  (parcors)  of  the  fitted,  AR  order  m  =  8  time 
varying  AR  coefficient  model.  Figures  4c,e,g  show  the  corresponding  computational  results  from 
an  "intervention  analysis"  model  that  is  part  of  TVCAR.FORT. 

The  smoothed  envelope  for  the  non-intervention  model  is  in  fact  quite  smooth.  Similarly  the 
instantaneous  spectrum  and  partial  correlation  coefficients  (parcors)  reflect  the  smooth  transition 
in  the  time  varying  AR  coeficients  model,  (3.3.2).  Two  visual  inspection  determined  "outlier" 
events  occur  at  n=835  and  n-1030.  They  correspond  to  the  arrival  times  of  the  P  and  S  waves 
and  are  identified  to  the  program,  by  human  intervention,  as  large  observation  variance  events. 
The  large  observation  variance  relaxes  the  priors  constraint  and  permits  the  AR  coefficients  and 
subsequently  derived  quantities  to  change  abruptly  at  those  instants.  The  "validity"  of  this  inter¬ 
vention  type  analysis  is  suggested  by  comparison  of  the  results  of  the  intervention  type  and  non¬ 
intervention  type  analysis.  The  properties  of  the  latter  "drift"  toward  the  former.  The  abrupt 
changes  in  the  appearance  of  the  envelope  function  and  in  the  instantaneous  spectra  and  parcors 
correspond  to  the  physical  interpretation  that  the  P  waves  and  S  waves  are  different  sources  of 


energy  at  the  observing  seismometer. 

S.5  COMMENTS,  DISCUSSION 

The  paper  by  Akaike  (1980)  motivated  our  work  in  smoothness  priors.  In  Akaike  (1980), 
computations  were  done  by  a  Householder  transformation  least  squares  algorithm  of  computa¬ 
tional  complexity  0(N*).  Brotherton  and  Gersch  (1981)  and  Kitagwa  (1981)  demonstrated  an 
equivalent  state  space  modeling  approach  for  the  linear  model  with  Gaussian  system  and  observa¬ 
tion  noise.  In  the  vicinity  of  the  maximised  likelihood,  the  likelihood  is  a  rather  flat  function  of 
the  hyperparameters.  This  fact  permits  a  relatively  coarse  grid  discrete  hyperparameter-likelihood 
search  procedure  to  determine  the  values  of  the  hyperparameters  that  tend  to  maximise  the  likeli¬ 
hood.  Such  a  procedure  preserves  the  O(JV)  computational  complexity  inherent  in  the  Kalman 
filter  computations.  A  computational  complexity  of  O(N)  version  of  that  method  was  subse¬ 
quently  applied  to  a  variety  of  nonstationary  mean  and  nonstationary  covariance  time  series 
modeling  problems,  (Gersch  and  Kitagawa  1983,1985,  Kitagawa  and  Gersch  1984,  1985b).  Varia¬ 
tions  of  the  procedures  in  those  papers  expressed  in  computer  programs  DECOMP.FORT  and 
TVCAR.FORT  (TIMSAC-84,  Akaike  et  al.  1985),  yielded  the  computational  results  shown  here. 

Potentially  many  more  combinations  and  extensions  of  the  models  shown  here  for  the 
modeling  of  nonstationary  mean  and  nonstationary  covariance  time  series  by  smoothness  priors 
methods,  are  possible.  For  example,  a  generalisation  of  the  regression  on  trading  days  com¬ 
ponents,  in  the  nonstationary  mean-decomposition  of  time  series  modeling,  could  take  into  account 
constant  coefficient  and/or  time  varying  coefficient  regressssion  on  other  time  series.  A  time  vary¬ 
ing  partial  AR  coefficients  variation  of  the  present  time  varying  AR  coefficient  mode)  for  the 
modeling  of  nonstationary  covariance  time  series,  has  already  been  implemented.  Another  poten¬ 
tial  variation  on  the  time  varying  AR  coefficient  model  would  be  to  estimate  the  full  nondiagonal 
mxm  system  noise  covariance  matrix  (the  matrix  of  hyperparameters).  Gersch  and  Kitagawa 
(1985),  an  application  of  the  time  varying  AR  coefficient  model,  includes  computation  of  the  time 


varying  covariance  function.  That  computation  is  useful  to  permit  computation  of  the  mean 
square  response  to  nonstationary  excitation  of  building  structures  to  single  realisations  of  seismic 
event  data. 

Some  other  linear  mod  el- Gaussian  disturbances-  state  space  smoothness  priors  models  have 
been  implemented.  Kitagawa  and  Takanami  (1985)  show  a  smoothness  priors  modeling  method 
for  the  extraction  of  seismic  signals  from  correlated  background  noise.  The  smoothness  priors 
innovation  in  that  work  is  the  implementation  of  a  non  constant  or  time  varying  hyperparameter. 
That  hyperparameter  achieves  a  time  varying  balance  of  the  tradeoff  between  the  variances  of  the 
seismic  signal  and  the  background  noise.  The  modeling  of  continuous  model  time  series  with 
discrete  time  observations  is  another  domain  where  smoothness  priors  state  space  modeling  has 
been  exhibited.  Kitagawa  (1984)  includes  a  smoothness  priors  variation  of  the  Jones  (1980)  con¬ 
tinuous  time  AR  process-discrete  time  observations  modeling.  The  application  in  Kitagawa  (1984) 
is  to  irregularly  spaced  or  missing  data  time  series  modeling. 


4.  STATE  SPACE  NON  GAUSSIAN  MODELING  OF  NONSTATIONARY  MEAN 
TIME  SERIES 

A  non- Gaussian  state  space  approach  to  the  modeling  of  nonstationary  time  series  is  shown. 
Neither  the  system  noise  nor  the  observation  noise  need  be  Gaussian.  Recursive  formulas  for  the 
prediction,  filtering  and  smoothing  of  the  state  are  given.  A  numerical  method,  based  on  a  piece- 
wise  linear  approximation  to  the  density  functions  for  realising  these  formulas,  is  also  given.  The 
merits  and  potential  wide  applicability  of  this  approach  to  non-Gaussian  modeling  are  illustrated 
by  some  numerical  examples.  Extension  of  this  method  to  the  state  space  modeling  of  nonlinear 
systems  is  straightforward. 

Earlier  in  this  chapter  we  demonstrated  the  wide  range  of  applicability  of  the  linear  model 
with  Gaussian  system  and  observation  noise.  There  are  numerous  problems  for  which  Gaussian 
modeling  is  inadequate.  For  example,  the  problem  of  trend  estimation  becomes  difficult  when  the 
trend  has  discontinuities  as  well  as  smooth  changes  and  when  there  are  observation  outliers.  A 
simple  linear  Gaussian  model  with  small  process  noise  variance  does  not  track  jumps  or  discon¬ 
tinuities  very  well.  A  model  with  large  process  noise  variance  will  respond  to  sudden  changes  in 
the  trend  but  it  will  also  be  inappropriately  wiggley  where  the  trend  is  quite  smooth.  The  treat- 
meant  of  such  trend  discontinuities  with  the  included  possibility  of  observation  outliers  in  the 
linear  Gaussian  model  framework  requires  a  complicated  model.  Heavy  tailed  distributions  for 
process  and  observation  noise  can  cope  with  these  problems  with  a  simple  model.  Also,  smoothing 
problems  in  which  there  is  a  time  varying  variance  and/or  a  nonhomogenous  binomial  or  Poisson 
mean  require  a  non  Gaussian  system  noise  model  formulation.  Similarly  nonlinear  models  such  as 
storage  models  for  riverflow  and  a  ship’s  nonlinear  manueverability  require  non  Gaussian  distribu¬ 
tion  models. 

Thus,  the  development  of  methods  for  treating  systems  with  non  Gaussian  distributions  is 
well  motivated.  In  earlier  attempts,  systems  with  non  Gaussian  distributions  were  approximated 
by  the  use  of  extended  Kalman  filters,  sums  of  Gaussian  distributions  ,  by  Edgeworth  or  Gram- 
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Charlier  expansion*  etc..  (See  Alspach  and  Sorenson  1972,  for  example.)  Here,  we  approximate 
the  probability  density  functions  directly  by  a  piecewise  linear  function.  The  recursive  prediction, 
filtering  and  smoothing  computation  required  by  the  state  space  modeling  are  realized  by  numeri¬ 
cal  integration.  A  similar  approach  was  considered  by  Bucy  and  Senne  (1971)  and  de  Figueiredo 
and  Jan  (1971)  in  the  context  of  nonlinear  filtering  problems.  Such  an  approach  is  more  feasible 
now  than  it  was  several  years  ago  because  of  the  development  and  proliferation  of  fast  computa¬ 
tional  facilities.  In  Section  4.1  the  state  space  prediction  ,  filtering  and  smoothing  formula  aspects 
of  the  numerical  computations  are  derived  and  the  computation  of  the  likelihood  for  the  not  neces¬ 
sarily  Gaussian  distribution  model  are  shown.  Numerical  examples  are  shown  in  Section  4.2  and  a 
discussion  and  comments  are  in  Section  4.3. 

4.1  THE  NON  GAUSSIAN  STATE  SPACE  MODEL 
Consider  the  stationary  state  space  system  described  by 

xm  ”  f’xm_t  +  (*-ll) 

Vn  “  HX,  +  C. 

where  as  before  F,  G,  and  H  are  linear  transformations.  The  independent  and  independent  of  each 
other,  but  not  necesssarily  Gaussian  process  and  observation  noises  are  v,  and  tm  respectively. 
The  initial  state  vector  z0  is  distributed  in  accordance  with  p(z0)  *»d  the  conditional  density  of 
the  state  at  time  a,  given  the  obervations  (ft,. -.,?«)  -  Ym  is  denoted  by  p(*J  Ym).  Then,  the 
recursive  formulas  for  the  one  step  ahead  prediction, filtering  and  smoothing  densities  are  derived 
as  follows: 


Filtering  (observation  update) 


P(*J  Ym)  -  p(za|  -  p(*a,ya|  K._i)/p(v.|  K.-,)  (4.1.3) 

*  P(».l  *,)p(*. I  y—i)/p(v.l  y»-i) 

where  p(za|  *,_,)  is  the  density  of  za  given  the  previous  state  vector  z._,,  p(yj  z„)  is  the  density  of 
r.  given  z.  and  p(yj  Ka_,)  is  obtained  by  J p(y.j  z„)p(z,|  K._,)dz.. 

Similarly,  consider  the  expression  for  the  joint  density  of  z„  and  za+J,  given  the  entire  obser¬ 
vation  sequence  YN, 

P(*»>*.+ ll  "  P(*.+il  *Ar)p(**l  *»+i.y.)  (414) 

“  P(*.+il  Y„)p(xm,za+ ,|  Ka)/p(za+1 1  K„) 

“  p(*«+il  yjr)p(*.+il  *»)p(*.I  yJ/p(*,+il  y«) 

From  (4.1.4)  we  obtain  the  formula  for  smoothing: 

P(*.l  Ym)  *  f_m  P(*..*.+il  *w)«**a+,  (4.1.5) 

-  P(*.l  y«)J_„P(*«+»l  y#)p(*»+il  *J/p(*.+iI  y-)*.+i- 

In  the  linear  Gaussian  case,  the  conditional  densities  p(za|  Ka_,),  p(za|  YN)  and  p(za|  YN) 
are  characterised  by  mean  vectors  and  covariance  matrices  .  Correspondingly,  (4.1.2), (4.1. 3)  and 
(4.1.5)  lead  to  the  Kalman  filter  and  the  fixed  interval  smoothing  algorithm.  In  the  non  Gaussian 
or  nonlinear  case  however,  it  is  necessary  to  evaluate  the  non  Gaussian  densities  explicitly  at  each 
step.  The  algorithms  above,  (4.1.3)-(4.1.5)  can  be  realised  numerically  by  piecewise  linear  approx¬ 
imations  to  the  density  functions,  transformation  of  densities,  convolution  of  densities  and  Bayes 
theorem  (product  of  two  densities  and  normalization).  Details  of  the  numerical  computations  are 
in  Kitagawa  (1987). 

In  general, the  non  Gaussian  model  has  some  unknown  parameters.  The  best  choice  of  the 
parameters  can  be  found  by  maximizing  the  log  likelihood  defined  by 
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1(f)  -  log  p(|f, -  £  log  p(p„|  -  E  log  p(p.|  K„-i).  (4.1.6) 

■  -1  Ml 

The  term  p(yj  Ka_t)  Appears  in  (4.1. 3)  and  can  be  evaluated  numerically.  If  we  have  several  can¬ 
didate  models,  including  models  with  different  types  of  system  noise  or  observation  noise  density 
functions,  we  choose  the  model  for  which  the  AIC  is  minimum. 


4.2  NUMERICAL  EXAMPLES 


Estimation  of  a  Shifting  Mean  Valne 

Consider  the  data  simulated  from  the  following  model, 


Ym  ~  JV(M.,1) 

i 

i 

M.  “ 

The  data  is  shown  in  Figure  5a.  The  problem  is  to  estimate  the  abruptly  changing  mean  value 
function  /*„. 


0  n=  1,...,100 

1  n-101,...,200 
-1  n«201,...,S00 

2  n-301,...,400 


(4.2.1) 


For  this  type  of  data  we  used  the  model 


V*<.  -  (4-2.2) 

»»  =  *.  + 

As  before,  V  i»  the  difference  operator  defined  by  *»-i  and  wm  and  e„  are  white  noise 

sequences  that  are  not  necessarily  normally  distributed.  For  simplicity  we  assume  that  the 
difference  order  k  is  one.  Equation  (4.2.2)  is  a  special  form  of  the  state  space  model,  Section  (3.1.), 
with  *a-ta,F—  C“ff*l.  We  considered  the  following  model  classes: 

» 

i 

J  Model(a):  w,  -  oJV(0, **)+(!  -  a)A(0,rf),  tm  ~  Af(0,l)  (4.2.3) 

j  Model  (b):  wH  ~  Q(b,r 3),  e.  ~  AT(0,l) 


Model  (a )  denotes  a  mixture  of  Gaussian  system  noises.  In  (4.2.3),  for  Model(o)  and  Model(b), 
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AT(0,r)  denotes  the  Gaussian  distribution  with  mean  0  and  variance  r.  In  Modti[b ),  Q(b,r) 
denotes  the  distribution  of  the  Pearson  system  with  density  q(x,b,i3)^C(i3  +  x1)-*  with  y<6^oo 

and  <7=r3*-1r(6)/r(-^-).  This  family  links  the  Cauchy  distribution  (6  =  1)  and  the  Gaussian  distri¬ 
bution  (6=oo).  In  Model(a),  rj  was  arbitrarily  set  to  4.0,  approximately  the  sample  variance  of 
the  simulated  data.  The  maximum  likelihood  estimate  of  i3  for  the  Gaussian  model,  Model(a), 
with  a  =  1.0  or  equivalently  Model(b)  with  6  -  oo,  was  f*=0.0429.  The  AIC  of  the  model  was 
1240.33.  For  the  mixture  of  Gaussian  system  noises  model,  <£=0.989,  ?*=0. 0000014  and 
AIC=  1212.48.  We  tried  four  Pearson  family  models:  6=0.6, 0.8, 1.0  and  oo.  6=0.80  is  the  AIC 
best  Pearson  family  model  with  f* =0.000002  and  AIC=1215.20.  The  AIC  best  model  is  the  mix¬ 
ture  of  Gaussian  system  noises  model. 

Figure  5b-5d  shows  the  marginal  posterior  density  p(zKj  Yu)  versus  time  n  for  the  Gaussian 
model,  the  best  Pearson  system  model  and  the  mixture  of  Gaussians  model  respectively.  For  the 
Gaussian  model,  Figure  5b,  the  densities  obtained  have  identical  shape  except  for  the  ends  of  the 
time  interval  where  the  densities  become  slightly  broader.  In  Figure  5c,  the  shape  of  the  posterior 
density  varies  with  time.  When  the  mean  value  shifts,  the  density  becomes  heavy  tailed  on  one 
side.  The  Gaussian  mixtures  model  also  exhibits  the  latter  behavior. 

Figures  5e-5g  shows  the  mean  (bold)  and  ±  1,2,3  sigma  intervals  of  the  p(xj  YN)  versus  n 
for  the  Gaussian  model  and  the  median  (bold)  and  corresponding  0.13,2.27,15.87,84.13,97.73  and 
99.87  percentage  points  of  the  Pearson  system  model  and  the  Gaussian  mixtures  model  respec¬ 
tively.  For  the  Gaussian  model,  the  estimated  mean  value  function  becomes  a  wiggly  curve  and 
does  not  reflect  the  sudden  change  of  the  mean  value.  The  estimated  median  of  the  Pearson  sys¬ 
tem  and  the  Gaussian  mixtures  models  do  capture  the  sudden  change  of  the  signal  mean  value. 
The  multimodal  or  skewed  distribution  and  jumps  of  the  mean  value  are  typical  of  the 
phenomenon  that  are  seen  in  non  Gaussian  modeling. 
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Estimation  Of  Changing  Variance 


The  estimation  of  changing  variance  was  discussed  in  Section  3.3.3  in  the  context  of  fitting  a 
time  varying  AR  coefficient  model  to  a  seismic  signal.  The  same  idea  is  exploited  here  to  estimate 
the  changing  variance  of  the  same  seismic  signal  with  the  state  space  non  Gaussian  modeling 
method.  A  first  order  difference  model  for  the  trend  of  the  log  of  the  sum  of  the  squares  of  succes¬ 
sive  observations  was  used,  +  tm  with  m=l,..,N/2.  Two  models 

were  considered,  the  Gaussian  system  noise-Gaussian  observation  noise  and  the  Cauchy  system 
noise  with  an  r(z)*=ezp(z-czp‘)  observation  noise  model.  The  corresponding  AlC’s  were  4778.94 
and  4222.84.  The  latter  model  was  the  AIC  beat  model.  The  original  seismic  wave  y„,n  =  l,...,JV 
and  the  fog((y|OT  +  Vj*-i)/2),  m=l,...,N/2  signals  are  shown  in  Figures  6a  and  6b  respectively. 
The  Gaussian  model  smoothed  mean  and  ±1,2, So  and  non  Gaussian  model  smoothed  median  and 
corresponding  probability  point  curves  are  shown  in  Figures  6c, 6d  respectively.  Those  illustrations 
indicate  that  the  Cauchy  system  noise  model  yields  better  estimates  of  the  smooth  mean  and 
abrupt  changes  of  the  mean  innovations  variance  than  the  simple  Gaussian  system  noise  model. 
Modeling  the  real  data  changing  variance  seismic  signal  with  the  state  space  non  Gaussian  system 
noise  method  automatically  yields  the  location  of  abrupt  changes  in  the  mean  of  the  siganal. 

4.3  COMMENTS,  DISCUSSION 

The  results  shown  here  were  obtained  using  a  simple  one  dimensional  state  vector.  In  princi¬ 
ple,  it  is  straightforward  to  extend  the  computational  formulas  to  higher  dimensional  state  sys¬ 
tems.  The  resulting  increase  in  the  computational  burden  required  to  compute  the  convolution  of 
density  functions  becomes  quite  severe.  A  variety  of  numerical  techniques  have  been  investigated 
to  cope  with  this  problem.  Very  likely  the  use  of  more  powerful  computers  rather  than  the 
increase  of  effort  in  numerical  analysis  methods  will  be  more  expeditious  in  the  development  of  the 
non  Gaussian  state  space  smoothness  priors  method. 
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Several  other  problems  lend  themselves  to  the  application  of  the  one  dimensional  non  Gaus¬ 
sian  modeling  shown  here.  Kitagawa  (1987)  shows  an  application  to  the  handling  of  discrete  dis¬ 
tributions.  The  time  varying  mean  of  a  real,  nonstationary  (nonhomogeneous)  binary  process,  is 
estimated.  Also,  smoothing  of  the  log  periodogram  using  a  state  space  model  with  wa~logx'2  and 
en~Cauchy  or  f Gaussian  is  an  alternative  to  the  Gaussian  distributions  approach  in  Wahba 
(1980). 

Our  procedure  also  extends  quite  naturally  to  the  analysis  of  nonlinear  systems.  The  one- 
step-ahead  prediction  formula  (4.1.2)  and  the  filtering  formula  (4.1.3)  are  applicable  even  for  non¬ 
linear  systems. 

Time  series  with  nonstationarities,  nonlinearities  and  outliers  that  have  been  difficult  to 
analyze  by  conventional  linear  Gaussian  models  can  be  quite  simply  analyzed  with  the  non  Gaus¬ 
sian  model.  The  computational  burden  using  the  non  Gaussian  filter  and  smoother  is  substantial. 
The  develpment  of  faster  algorithms  and  the  use  of  faster  computers  will  alleviate  this  burden. 


5.  SUMMARY 


The  ingredients  for  the  smoothness  priors  analysis  of  time  series  are  the  model,  the  prescrip¬ 
tion  of  the  priors,  the  criterion  for  goodness  of  model  fit  and  the  computational  method. 

Initially,  smoothness  priors  modeling  of  stationary,  nonstationary  mean,  and  nonstationary 
covariance  time  series  was  demonstrated  in  the  context  of  the  linear  model  with  Gaussian  distri¬ 
buted  system  noise  and  with  Gaussian  distributed  observation  noise.  Both  time  domain  and  fre¬ 
quency  domain  specifications  of  the  prior  distribution  of  the  model  parameters  were  considered.  A 
hyperparameter  specifies  the  degree  of  belief  in  the  prior  distribution.  The  smoothness  priors 
method  of  analysis  derives  its  unity  from  the  fact  that  the  likelihood  of  the  Bayesian  model  (the 
likelihood  of  the  hyperparameter(s)}  is  the  single  criterion  by  which  the  goodness  of  fit  of  the 
model  is  determined.  The  maximization  of  the  likelihood  of  a  small  number  of  hyperparameters 
permits  the  modeling  of  time  series  with  complex  structure  and  a  large  number  of  implicitly 
inferrred  parameters.  When  there  are  alternative  candidate  smoothness  priors  models,  we  use 
Akaike’s  AIC  to  determine  the  best  of  alternative  models,  (the  likelihood  of  the  model  has  a  cen¬ 
tral  role  in  the  AIC).  Householder  transformation  least  squares  and  Kalman  filter  algorithms,  were 
the  means  for  the  realisation  of  the  smoothness  priors  time  series  modeling. 

Finally,  we  demonstrated  a  state  space  representation  not  necessarily  Gaussian  not  neces¬ 
sarily  linear  model  method  of  smoothness  priors  modeling.  In  that  method,  piecewise- linear 
approximation  to  densities  and  numerical  integration  computations  were  employed.  Conceptually 
all  of  the  possible  combinations  of  models  and  smoothness  priors  computations  could  be  realised 
with  this  method. 

The  extensive  applicability  of  smoothness  priors  modeling  methods  in  time  series  modeling 
was  demonstrated.  A  large  number  of  other  problems  that  have  not  been  well  solved  by  more 
traditional  time  series  methods  remain  to  be  solved  by  that  method. 
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FIGURE  1.  Trend  Estimation 

a:  Truncated  Gaussian  signal  and  signal  plus  noise,  b:  Signal  plus  noise  plus  smoothed  trend  with 
a  too  large  hyperparameter,  c:  Signal  plus  noise  plus  smoothed  trend  with  a  too  small  hyperparam¬ 
eter,  d:  Signal  plus  noise  plus  smoothed  trend  with  optimum  hyperparameter, 

Figure  2.  (2.1)Spectral  Densities  From  Canadian  Lynx  Data  Example 

a:  Spectral  density  versus  frequency,  smoothness  priors  model,  b:  Spectral  density  versus  frequency, 
AIC-AR  model,  c:  Superposition  of  spectral  densities  versus  frequency,  AR  models. 

FIGURE  3.  (S.l)Nonstationary  Mean,  RSWOMEN  Data. 

Trend  plus  seasonal  plus  trading  component  model,  a,b,c,d,e. 

a:  Original  data  and  trend,  b:  Seasonal  component,  c:  Trading  day  effect,  d:  Seasonal  plus  trading 
day  effect,  e:  True,  predicted  and  plus  and  minus  one  sigma  plus  predicted. 

Trend  plus  seasonal  plus  AR  plus  trading  component  model,  f,g,h,ij.  f:  Original  data  and  trend, 
g:  AR  component,  h:  Original  plus  trend  plus  AR  component,  i:  Residual  noise,  j:  True,  predicted 
and  plus  and  minus  one  sigma  plus  predicted. 

FIGURE  4.  Nonstationary  Covariance,  Seismic  Data. 

a.  Seismic  data,?] . y„.  "ordinary  model"  b,d,f;  "intervention  model"  c,e,g. 

b, c:  log{{ylm  +  yjm_ l)/2),  m«l,...,N/2  data  and  smoothed  envelope,  d,e:  instantaneous  power 
spectral  density,  f,g:  parcors. 

FIGURE  5.  State  Space  Model  Non  Gaussian  Discontinuous  Trend  Example, 
a:  Abruptly  changing  trend  data,  b:  Smoothed  state  estimate,  Gaussian  system  noise  model,  c: 
Smoothed  state  estimate,  Pearson  system-system  noise  model,  d:  Smoothed  state  estimate,  Gaus¬ 
sian  mixture  system  noise  smodel,  e:  Posterior  mean,  ±l,2,3<r,  Gaussian  system  noise  model,  f: 
Posterior  median  and  (0.13,2.27,15.87,84.13,97.73)  percentage  points,  Pearson  system-system  noise 


model,  g:  Posterior  median  and  (0. IS, 2. 27, 15. 87, 84. 13,97.73)  percentage  points,  Gaussian  system 
noise  model. 

FIGURE  6.  State  Space  Model  Non  Gaussian  Envelope  of  Seismic  Signal  Example, 
a:  Seismic  data.y,,...,^.  b:  +  Vj»-i)/2),  m-l,...,JV/2  "envelope”  data,  c:  Posterior 

mean,  ±1,2, 3<r,  Gaussian  disturbances  model,  d:  Posterior  mode,  (0.13,2.27,15.87,84.13,97.73)  per¬ 
centage  points,  non-Gaussian  disturbances  model. 
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